CS 545 Homework III (Made By Oğulcan Emre Örsel)

Let's import the libraries

Defining Functions

PART I

Problem I.(a)

Problem I.(b)

Problem I.(c)

Problem I.(d)

PART II

Problem 2

Let's plot a test image

We then randomly select the samples from our data set

Writing functions

Testing with 5 dimensions

Testing with 10 dimensions

Testing with 20 dimensions

Testing with 30 dimensions

Another random initialization for 30 dimensions

From the above examples, we observe that the overall performance of the classifier is increasing with higher dimensions indicating that compression to very few dimensions increase the error rate (after 30dimensions we reach 90 % success rate). Additionally, we see that the training data also plays an important factor (not major as the compression dimension) for the classifier performance. So, the training data should be chosen carefully as well

Problem 3

Importing data

Test 1 for Q3

Test 2 for Q3

Similar to the Q2, we observe that increasing the number of dimensions significantly changes the output success rate. Additional to the Q2, here the initial training data selection plays an important role for the future success of the classifier (as it can be seen by the success rates). I think that is due to the similarities between speech and music when it comes to bad examples.

Problem 3 My voice & My music

We observe that the success rate reduced significantly due to the bad training examples when compared with the input data. If I have chosen a different input data set, the results would be really different indicating that the performance of this classifier (like every classifier) depends on the training data. Also, the performance improves as we increase the feature dimension.

Question 4

Here I will be using two different training data to show their effect on the classification

REAL CODE STARTS HERE (ABOVE IS JUST SAMPLE GENERATION)

Training Set I

Training Set II

Moving with the set I

We see that our classifier is successfull while realizing some pools however there are still missing ones due to the classifier (gaussian) and the training data. Also there are lots of wrong classification in the left image. This is a sign that image problems are relatively complicated (in terms of the representation of the data), so we need nonlinear classifier.

Training Set II

From both runs, we observe that the first training data set performed better when compared with the second one and this indicates that our classifier prone to error (high dependency to the training data). We can solve this problem by using a non-linear classifier rather than using a gaussian one. Also, we can choose an even better data set to improve this result. I believe choosing a non-linear classifier is an better choice.